An STD system for OOV query terms using various subword units

نویسندگان

Hiroyuki Saito

Takuya Nakano

Shiro Narumi

Toshiaki Chiba

Kazuma Konno

Yoshiaki Itoh

چکیده

We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms using various subword units, such as monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models. In the proposed method, subword-based ASR is performed for all spoken documents and subword recognition results are generated using subword acoustic models and subword language models. When a query term is given, the subword sequence of the query term is searched for all subword sequences of subword recognition results of spoken documents. Here, we use acoustical distances between subwords when matching the two subword sequences in Continuous Dynamic Programming. Demiphone and one-third phone models were newly developed for an STD task. We have also proposed the method integrating plural STD results obtained using each subword models. Each candidate segment has a distance, the segment number and the document number. These plural distances are integrated linearly using weighting factors. In STD tasks of IR for Spoken Documents in NTCIR-9, we apply various subword models to the STD tasks and integrate plural STD results obtained from these subword models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An STD System for OOV Query Terms Integrating Multiple STD Results of Various Subword units

We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms integrating various subword recognition results using monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models. In the proposed method, subword-based ASR (Automatic Speech Recognition) is performed for all spoken documents and subword recognition results are generate...

متن کامل

Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection

The detection of out-of-vocabulary (OOV) query terms is a crucial problem in spoken term detection (STD), because OOV query terms are likely. To enable search of OOV query terms in STD systems, a query subword sequence is compared with subword sequences generated using an automatic speech recognizer against spoken documents. When comparing two subword sequences, the edit distance is a typical d...

متن کامل

An IWAPU STD System for OOV Query Terms and Spoken Queries

We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms integrating various subword recognition results using monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models[1][2]. In this paper, we describe two methods for text OOV query terms and spoken queries. For text OOV query terms, we introduce four unique methods. First...

متن کامل

An approach for efficient open vocabulary spoken term detection

A hybrid two-pass approach for facilitating fast and efficient open vocabulary spoken term detection (STD) is presented in this paper. A large vocabulary continuous speech recognition (LVCSR) system is deployed for producing word lattices from audio recordings. An index construction technique is used for facilitating very fast search of lattices for finding occurrences of both in vocabulary (IV...

متن کامل

Effect of Pronunciations on Oov Queries in Spoken Term Detection

This paper focusses on the effect of pronunciations for Out-ofVocabulary (OOV) query terms on the performance of a spoken term detection (STD) task. OOV terms, typically proper names or foreign language terms occur infrequently but are rich in information. The STD task returns relevant segments of speech that contain one or more of these OOV query terms. The STD system described in this paper i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

An STD system for OOV query terms using various subword units

نویسندگان

چکیده

منابع مشابه

An STD System for OOV Query Terms Integrating Multiple STD Results of Various Subword units

Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection

An IWAPU STD System for OOV Query Terms and Spoken Queries

An approach for efficient open vocabulary spoken term detection

Effect of Pronunciations on Oov Queries in Spoken Term Detection

عنوان ژورنال:

اشتراک گذاری